87 research outputs found
Segmentation Based Mesh Denoising
Feature-preserving mesh denoising has received noticeable attention recently.
Many methods often design great weighting for anisotropic surfaces and small
weighting for isotropic surfaces, to preserve sharp features. However, they
often disregard the fact that small weights still pose negative impacts to the
denoising outcomes. Furthermore, it may increase the difficulty in parameter
tuning, especially for users without any background knowledge. In this paper,
we propose a novel clustering method for mesh denoising, which can avoid the
disturbance of anisotropic information and be easily embedded into
commonly-used mesh denoising frameworks. Extensive experiments have been
conducted to validate our method, and demonstrate that it can enhance the
denoising results of some existing methods remarkably both visually and
quantitatively. It also largely relaxes the parameter tuning procedure for
users, in terms of increasing stability for existing mesh denoising methods
In-Place Gestures Classification via Long-term Memory Augmented Network
In-place gesture-based virtual locomotion techniques enable users to control
their viewpoint and intuitively move in the 3D virtual environment. A key
research problem is to accurately and quickly recognize in-place gestures,
since they can trigger specific movements of virtual viewpoints and enhance
user experience. However, to achieve real-time experience, only short-term
sensor sequence data (up to about 300ms, 6 to 10 frames) can be taken as input,
which actually affects the classification performance due to limited
spatio-temporal information. In this paper, we propose a novel long-term memory
augmented network for in-place gestures classification. It takes as input both
short-term gesture sequence samples and their corresponding long-term sequence
samples that provide extra relevant spatio-temporal information in the training
phase. We store long-term sequence features with an external memory queue. In
addition, we design a memory augmented loss to help cluster features of the
same class and push apart features from different classes, thus enabling our
memory queue to memorize more relevant long-term sequence features. In the
inference phase, we input only short-term sequence samples to recall the stored
features accordingly, and fuse them together to predict the gesture class. We
create a large-scale in-place gestures dataset from 25 participants with 11
gestures. Our method achieves a promising accuracy of 95.1% with a latency of
192ms, and an accuracy of 97.3% with a latency of 312ms, and is demonstrated to
be superior to recent in-place gesture classification techniques. User study
also validates our approach. Our source code and dataset will be made available
to the community.Comment: This paper is accepted to IEEE ISMAR202
Masked Autoencoders in 3D Point Cloud Representation Learning
Transformer-based Self-supervised Representation Learning methods learn
generic features from unlabeled datasets for providing useful network
initialization parameters for downstream tasks. Recently, self-supervised
learning based upon masking local surface patches for 3D point cloud data has
been under-explored. In this paper, we propose masked Autoencoders in 3D point
cloud representation learning (abbreviated as MAE3D), a novel autoencoding
paradigm for self-supervised learning. We first split the input point cloud
into patches and mask a portion of them, then use our Patch Embedding Module to
extract the features of unmasked patches. Secondly, we employ patch-wise MAE3D
Transformers to learn both local features of point cloud patches and high-level
contextual relationships between patches and complete the latent
representations of masked patches. We use our Point Cloud Reconstruction Module
with multi-task loss to complete the incomplete point cloud as a result. We
conduct self-supervised pre-training on ShapeNet55 with the point cloud
completion pre-text task and fine-tune the pre-trained model on ModelNet40 and
ScanObjectNN (PB\_T50\_RS, the hardest variant). Comprehensive experiments
demonstrate that the local features extracted by our MAE3D from point cloud
patches are beneficial for downstream classification tasks, soundly
outperforming state-of-the-art methods ( and classification
accuracy, respectively).Comment: Accepted to IEEE Transactions on Multimedi
Don't worry about mistakes! Glass Segmentation Network via Mistake Correction
Recall one time when we were in an unfamiliar mall. We might mistakenly think
that there exists or does not exist a piece of glass in front of us. Such
mistakes will remind us to walk more safely and freely at the same or a similar
place next time. To absorb the human mistake correction wisdom, we propose a
novel glass segmentation network to detect transparent glass, dubbed
GlassSegNet. Motivated by this human behavior, GlassSegNet utilizes two key
stages: the identification stage (IS) and the correction stage (CS). The IS is
designed to simulate the detection procedure of human recognition for
identifying transparent glass by global context and edge information. The CS
then progressively refines the coarse prediction by correcting mistake regions
based on gained experience. Extensive experiments show clear improvements of
our GlassSegNet over thirty-four state-of-the-art methods on three benchmark
datasets
- …